Skip to content

Cloudflare detecting pupeteer #841

Open
@joeledwardson

Description

I have not queried or clicked anything using puppeteer, simply connected to the browser seems enough for cloudflare to block access to a site.

I have used the simplest possible example in puppeteer with a real browser (no headless) and no automation scripts.

import puppeteer from 'puppeteer-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
puppeteer.use(StealthPlugin())

;(async () => {
  console.log('launching...')
  const browser = await puppeteer.launch({
    executablePath: 'C:/Program Files/Google/Chrome/Application/chrome.exe',
    headless: false,
    defaultViewport: null
  })
  console.log('connected')
  const page = await browser.newPage()
  await page.goto('https://nowsecure.nl')
  console.log('waiting for 1 min...')
  await new Promise((r) => setTimeout(r, 60_000))
  console.log('closing...')
  await browser.close()
})()

I have replicated this without puppeteer and clicking on the cloudflare verification button I pass through to the website, which means I suspect that somehow they are able to detect Puppeteer?

The video below shows manual clicking but cloudflare refuses access:

Just.a.moment.-.Google.Chrome.2023-09-27.16-04-47.mp4

I have also replicated this on android, forwarding the port to chrome dev tools via ADB and connected to the debugging port and experience the same result.

For mobile, I:

  • use ADB to forward the chrome dev tools port: adb forward tcp:9000 localabstract:chrome_devtools_remote
  • Run the following script to connect with puppeteer
import { Browser, connect } from 'puppeteer-core'

let browser: Browser | null = null

const timer = (ms: number) => new Promise<null>((res) => setTimeout(() => res(null), ms))

export async function puppeteerConnect({
  port,
  queryTimeoutMs
}: {
  port: string
  queryTimeoutMs: number
}): Promise<Browser> {
  const debuggerUrl = 'http://127.0.0.1:' + port + '/json/version'

  const fetcher = async () => {
    const result = await fetch(debuggerUrl)
    return await result.text()
  }

  const result = await Promise.race([timer(queryTimeoutMs), fetcher()])
  if (result === null) {
    throw new Error('get debugger URL timed out')
  }

  const data = JSON.parse(result) as { webSocketDebuggerUrl?: unknown }

  const wsUrl = data?.webSocketDebuggerUrl
  if (typeof wsUrl !== 'string') {
    throw new Error('get debugger url from response failed, `wsUrl` is not string')
  }

  // use socket url to connect to with puppeteer
  const browser = await Promise.race([
    connect({
      browserWSEndpoint: wsUrl,
      defaultViewport: null
    }),
    timer(queryTimeoutMs)
  ])
  if (browser === null) {
    throw new Error('puppeteer connect timed out')
  }
  return browser
}

async function retryConnect() {
  let lastErr: unknown = null
  let i = 0
  while (i < 20) {
    console.log('connection attempt #', i)
    try {
      return await puppeteerConnect({ port: '9000', queryTimeoutMs: 500 })
    } catch (err) {
      lastErr = err
    }
    await new Promise((r) => setTimeout(r, 1000))
    i += 1
  }
  throw lastErr
}

;(async () => {
  console.log('connecting...')
  const _browser = await retryConnect()
  console.log('connected!')
  browser = _browser
  const pages = await browser.pages()
  const firstPage = pages[0]
  if (!firstPage) {
    throw new Error('NO PAGE')
  }
  await firstPage.goto('https://nowsecure.nl')

  await new Promise((r) => setTimeout(r, 60_000))
})().finally(() => {
  console.log('browser disconnecting')
  browser?.disconnect()
  console.log('should be done?')
})

Activity

NodePuppeteer

NodePuppeteer commented on Sep 28, 2023

@NodePuppeteer

Try using the start-up tab and see if it works. We have more info on this problem here: #832

krkeegan

krkeegan commented on Dec 12, 2023

@krkeegan

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

bajgit98

bajgit98 commented on Jun 7, 2024

@bajgit98

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

I had luck up until now. Now, anything that is protected by Cloudflare, simply doesn't let me do anything... even if I solve captcha myself... it continues spinning, or reporting that I've failed to pass the test as human being.

Is there anyone that had luck resolving this issue?

mdervisaygan

mdervisaygan commented on Jun 7, 2024

@mdervisaygan

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

I had luck up until now. Now, anything that is protected by Cloudflare, simply doesn't let me do anything... even if I solve captcha myself... it continues spinning, or reporting that I've failed to pass the test as human being.

Is there anyone that had luck resolving this issue?

https://medium.com/@zfcsoftware/how-to-bypass-cloudflare-with-node-js-869fa6e21dd5

vladtreny

vladtreny commented on Jun 8, 2024

@vladtreny

Friend, your article is absolutely wrong... You completely do not understand the cause of this issue.
Please stop spamming these threads.

mdervisaygan

mdervisaygan commented on Jun 8, 2024

@mdervisaygan

Friend, your article is absolutely wrong... You completely do not understand the cause of this issue. Please stop spamming these threads.

The article is about passing Cloudflare. 2 pieces of code are given. Both can easily pass including the corporate plan. Which part is wrong? I am trying to convey a source because they constantly say that we cannot pass Cloudflare. Explain the wrong part and let's learn together. Also, I'm not spamming. My first message was to link a github discussion. It has nothing to do with me and there are dozens of people in that discussion. I am waiting for you to explain what is wrong.

Kosmoon

Kosmoon commented on Jun 8, 2024

@Kosmoon

i had this issue, some website have more advanced scraper detection. The solution was to use a proxy residential service like brightdata, and pass the proxy args to pupeteer.

const BROWSER_CONFIG: PuppeteerLaunchOptions = {
  headless: 'new',
  defaultViewport: null,
  ignoreHTTPSErrors: true,
  args: ['--proxy-server=xxxx:xxxx'],
};

const browser = await puppeteer.launch(BROWSER_CONFIG);
const page = (await browser.pages())[0];

await page.authenticate({
  username: 'xxxxx',
  password: 'xxxxxx',
});

monsterlady

monsterlady commented on Jul 27, 2024

@monsterlady

zfcsoftware

bruh the method this blog introduced not work for me

mdervisaygan

mdervisaygan commented on Jul 27, 2024

@mdervisaygan

zfcsoftware

bruh the method this blog introduced not work for me

You can test puppeteer-real-browser with the latest version. You should not have any problems, it has just been updated. If you are using Linux, I recommend running it with Docker.

Windows Server Test:
https://github.com/user-attachments/assets/b1c4dca1-db48-4692-ac67-fc399d11e009

Ubuntu 24 test:
https://github.com/user-attachments/assets/b1040e6a-9d8d-4fed-910a-52cabbd82130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @krkeegan@vladtreny@joeledwardson@monsterlady@Kosmoon

      Issue actions

        Cloudflare detecting pupeteer · Issue #841 · berstend/puppeteer-extra